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1 . INTRODUCTION 


Any  program  that  processes  unscreened  input  must  expect  the  worst 
about  its  input  data.  If  the  input  data  are  as  complex  and,  fortunately, 
as  well  structured  as  the  source  language  for  a compiler,  then  a very 
methodical  approach  must  be  used,  the  most  reliable  one  being  parsing 
according  to  a grammar. 

Compilers  use  parsers  for  two  purposes.  First,  to  check  the  source 
program  for  syntactic  correctness,  and  second,  to  control  the  sequence 
of  semantic  actions  effecting  the  translation  of  the  source  code  into 
intermediate  or  object  code.  This  is  normally  done  on  several  levels? 
on  the  lexical  level  to  group  characters  into  tokens,  on  the  level  of 
declarations  that  inform  the  compiler  about  symbols  and  do  not  generate 
code  (declaration  level) , and  on  the  level  or  constructs  that  generate 
code  (expression  and  statement  level). 

In  the  TRIDENT  Compiler,  these  three  levels  are  handled  by  a scan- 
ner, an  LALR  parser,  and  a context  parser  (see  Section  5).  Errors  in 
the  source  text  can  occur  on  any  of  these  levels.  Of  course,  errors 
can  also  occur  during  the  execution  of  semantic  routines  since  by  far 
not  all  aspects  of  a programming  language  are  handled  syntactically. 

For  example,  all  checking  of  compatible  attributes  of  symbols  such  as 
OWN,  TYPE,  PROCEDURE,  ARRAY,  etc.,  is  implemented  by  an  extra  mechanism 
outside  the  LALR  parser.  Similarly,  all  type  checking  of  expressions 
is  done  outside  the  context  parser. 

In  general,  error  handling  in  semantic  routines,  or  parser  support 
routines  such  as  get-next-item,  is  much  simpler  than  error  handling 
within  a parser  since  those  routines  deal  with  a very  narrow  environment 
known  by  the  programmer.  Therefore,  a general  method  for  error  handling 
is  to  make  the  grammar  used  by  the  compiler  more  permissive  than  the 
true  grammar  for  the  language.  For  example,  the  compiler  grammar  will 
contain  productions  that  allow  different  ordering  of  keywords  in  decla- 
rations such  as: 

OWN  INTEGER  variable  list,  or 

INTEGER  OWN  variable  list 

or  it  may  allow  for  a subscripted  variable  any  number  of  subscripts 
instead  of  only  one,  two,  or  three.  The  relevant  semantic  routines  will 
contain  checks  for  the  restrictions  according  to  the  true  grammar  and 
issue  error  messages  accordingly.  This  approach  is  used  in  the  TRIDENT 
Compiler  extensively  on  the  declaration  level,  to  a lesser  extent  on 
the  expression  and  statement  level. 
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A parser,  on  the  other  hand,  operates  in  a much  more  general  en- 
vironment and  it  takes  much  analysis  and  work  on  the  programmer's  part 
to  classify  error  situations,  to  issue  meaningful  error  messages,  to 
take  corrective  actions,  and  to  recover  from  the  error  situation  that 
is,  to  find  a point  to  continue  parsing.  The  primary  goal  of  error  pro- 
cessing is  to  inform  the  user  preceisely  about  the  causes  of  errors  in 
the  program  and  to  continue  parsing  the  rest  of  the  program  after  errors 
have  been  found  in  order  to  find  possibly  other  independent  errors  with- 
out cascading  errors  found  so  far.  The  purpose  is  not  to  second-guess 
the  real  intentions  of  the  user  and  to  produce  an  object  program  accord- 
ing to  what  the  user  probably  meant,  in  accordance  with  this  philosophy, 
the  TRIDENT  Compiler  will  not  generate  any  code  once  a syntax  error  has 
been  found  on  the  expression  and  statement  level. 

The  following  basic  definitions  will  be  used:  An  initial  substring 
of  the  source  text  is  said  to  contain  a syntax  error  if  it  cannot  be 
completed  to  a sentence  in  the  language.  The  last  item  of  the  smallest 
initial  substring  with  a syntax  error  is  said  to  bo  the  error  item  and 
its  place  in  the  source  text  is  the  error  location. 

This  paper  will  explain  very  briefly  the  overall  error  recovery 
strategy  within  the  LALR  parser.  The  method  is  very  simple  and  fast  and 
works  satisfactorily  for  a grammer  that  has  almost  no  nested  constructs. 

For  grammars  containing  mostly  nested  constructs,  a more  sophisticated 
approach  is  appropriate.  A promising  approach  would  be  to  work  out  a 
practical  method  based  on  the  theoretical  scheme  for  error  recovery  in 
conjunction  with  LR  parsers  as  presented  in  [1], 

The  major  part  of  this  paper  deals  with  a general  method  of  classi- 
fying error  situations  and  corrective  actions  during  error  recovery  in  a 
context  parser.  This  method  is  based  on  a classification  of  terminals, 
non-terminals,  and  heads  of  an  operator  grammer,  according  to  certain 
syntactic  characteristics.  The  scheme  is  explained  in  detail  for  the 
TRIDENT  Higher  Level  Language  (THLL) . It  allows  condensation  [1,21  of 
input  text  to  higher  syntactic  units  if  the  error  item  starts  a new  con- 
struct. However,  no  condensation  of  partially  parsed  input  on  the  parser 
stack  is  permitted.  The  reason  for  this  strict  condensation  policy  is 
to  preserve  the  integrity  of  the  semantic  environment.  It  is  this  prob- 
lem of  aligning  semantic  and  syntactic  processing  across  syntax  errors 
that  has  found  very  little  attention  in  the  literature.  Yet,  in  many 
compilers,  including  the  TRIDENT  Compiler,  it  is  important  to  continue 
semantic  processing  even  after  syntax  errors  have  been  encountered  since  * 

large  classes  of  errors,  e.g.,  type  errors,  would  remain  undetected. 

This  error  recovery  scheme  does  not  employ  backtracking  of  parsed 
input  and  does  not  pursue  parallel  parses.  It  is  based  on  a systematic 
analysis  of  grammar  units  that  allows  the  major  error  recovery  decisions 
to  be  made  ahead  of  time  for  all  incorrect  programs.  It  also  the  pro- 
duction of  an  error  recovery  table  containing  encodings  of  all  important 
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classes  of  grammar  units  automatically  from  a grammar  and  certain 
additional  information.  This  aspect  and  the  approach  to  preserve  the 
integrity  of  the  semantic  environment  appear  to  be  improvements  of 
existing  techniques  for  syntax  error  recovery. 


2.  ERROR  RECOVERY  IN  CONJUNCTION  WITH  A LALR  PARSER 


A LALR  parser  can  discover  an  error  item  as  soon  as  this  item  is 
seen.  However,  it  may  be  seen  when  the  parser  is  in  a lookahead  state. 
In  this  case,  recognition  of  the  error  item  is  delayed  since  the  LALR 
parsing  tables  are  optimized  in  the  sense  that  in  lookahead  states  only 
those  items  are  considered  individually  that  discriminate  between  next 
states,  while  all  other  items,  legal  or  not,  correspond  to  a 0 entry  in 
the  lookahead  list  and  are  treated  alike  by  the  lookahead  state  causing 
a transition  to  the  same  next  state.  Thus,  instead  of  recognizing  the 
error  item  at  this  time,  the  parser  will,  based  on  the  assumption  that 
the  next  item  was  a legal  item  not  listed  in  the  lookahead  list,  even- 
tually enter  a read  state,  possibly  after  a number  of  apply  states,  and 
then  recognize  the  error  condition.  Thus,  an  error  condition  is  en- 
countered if  the  parser,  being  in  a read  state  S,  cannot  read  the  next 
item. 


The  general  approach  in  the  TRIDENT  Compiler  is  to  supply  the  parser 
with  an  item  it  can  read.  This  is  achieved  by  inserting  an  item  from 
the  read  list  in  the  grammar  tables  into  the  source  text.  There  is  a 
special  algorithm  based  on  the  concept  of  hard  tokens  (BEGIN,  semicolon, 
END)  and  priority  of  tokens  that  will  select  a token  in  the  read  list. 

This  algorithm  has  been  tuned  by  trial  and  error  through  many  experiments. 
It  is  most  important  for  a decent  behavior  of  this  error  recovery  scheme. 
This  algorithm  may  also  decide  to  not  select  an  item.  In  that  case , the 
error  recovery  routine  will  either  discard  the  error  item  or  pop  off 
one  element  of  the  parser  stack  making  it  the  current  state.  This  lat- 
ter choice  is  the  first  corrective  action  attempt  taken  by  the  simple 
error  recovery  routine  of  a standard  LALR  parser  [3]. 

When  a particular  error  situation  causes  cascading  problems , then 
it  may  be  possible  to  make  local  modifications  to  the  grammar  and  add 
error  checks  to  relevant  semantic  routines.  In  this  way,  a small  error 
recovery  routine  can  be  tuned  up  to  a powerful  debugging  aid. 
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3.  OPERATOR  GRAMMARS  AND  CONTEXT  PARSERS 


The  TRIDENT  Compiler  uses  on  the  statement  and  expression  level  a 
parser  which  is  an  improved  variation  of  the  classic  transition  matrix 
parser  as  described  by  D.  Gries  [4] . This  section  and  the  next  describe 
briefly  the  general  behavior  of  this  type  of  parser,  called  context  par- 
ser, both  from  a syntactic  and  semantic  viewpoint. 

I.et  G = (VN,  VT,  P,  Q)  be  an  operator  grammar  and  Q ■>  ALPHA  Z FINIS 
the  only  production  in  P that  involves  Q.  The  terminals  ALPHA,  FINIS 
serve  as  a begin  and  end  marker  of  a sentence  in  L(G) . We  shall  consider 
strings  over  VT  ^ Vjn,  where  VjN  C V(j  is  a set  of  input  non-terminals. 

In  praxis,  Vjjj  will  contain  items  such  as  variable-id,  procedure-id,  etc. 
Theoretically,  VIN  can  be  any  subset  of  VN  and,  therefore,  the  parser 
described  below  will  work  for  sentential  forms.  Thus, 

L(G)  = {ALPHA  S FINIS:  S £ (VT  VIN) * and  Z 4 S> 

Since  G is  an  operator  grammar,  non-terminals  will  never  be  adja- 
cent, neither  in  rightsides  of  a production  nor  in  any  S e L.  We  view 
S as  a sequence  of  items  alternating  between  terminals  and  non-terminals 
by  filling  in  a NIL  non-terminal,  used  only  for  this  purpose,  between 
two  terminals  if  they  are  adjacent. 


Consider  a parser  for  L(G)  that  works  as  follows:  A sentence 
ALPHA  S FINIS  in  L(G)  is  parsed  by  moving  through  a finite  sequence  of 
configurations  of  the  form 

h0  hj . . .hk,  V,  T y FINIS 


where 


h„...hk  is  a sequence  of  heads, 

V is  a non-terminal, 

T is  a terminal,  and 

y is  a string,  the  remainder  string  not  yet  seen  by  the  parser. 

We  call  V the  current  non-terminal  or  the  item  being  looked  at, 
h0...hk  its  left  context,  and  T y FINIS  its  right  context.  Each  head  is 
an  incomplete  right  part  of  a production  ending  in  a terminal.  A head 
will  be  represented  in  this  paper  by  an  initial  segment  of  a right  part 
underlined  and  marked  off,  e.g.: 

variable  = , or  FOR  variable  = e STEP  , 


The  initial  configuration  is 


h 0 , NIL,  S FINIS 


where 


h0  is  ALPHA  , . 

The  final  configuration  is 
h0,  Z,  FINIS 

The  parser  is  initialized  such  that  on  input  w the  initial  configura- 
tion is 

hg,  NIL,  w 

Thus,  a string  w parsed  successfully  as  a sentence  will  have  the  form: 


S FINIS 


where 


Z * S 


The  following  moves  or  transitions  from  a configuration 


hk_i  hk,  V,  T V iT i 


are  possible: 


(1)  hk-1  X,  Vj,  T j 


X = hvVT 


-hk-j,  X,  Tj- 


X = hkVT  , Vj  must  be  NIL 


-hk_lf  X , T V jT j- 


X = hkV 


(4)  hk_i  hk,  X,  Tr- 


X = VT  , Vj  must  be  NIL 


-hk  X,  V,,  T , - 


X = VT 


(6)  Exit  parser  if  current  configuration  is  h0,  Z,  FINIS. 


(7)  Error  condition;  attempt  to  find  a resume  configuration 
and  continue. 
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No  transitions  according  to  a production  U -*•  V are  ever  made.  It 
is  implied  that  the  same  transition  with  U as  non-terminal  is  to  be  used 
when  V is  in  the  non-terminal  position.  This  assumption  is  made  to  re- 
duce the  size  of  the  parsing  tables.  Grammars  must  satisfy  certain  con- 
ditions to  allow  this  parsing  scheme.  This  is  explained  in  more  detail 
in  [5]. 

It  is  clear  that  for  a given  grammar,  the  number  of  all  possible 
configurations  is,  in  general,  not  finite.  Some  finite  procedure  is 
needed  to  compute  the  next  configuration  from  the  current  one.  For 
programming  languages,  normally  a very  limited  context  determines  the 
transition  uniquely.  For  the  grammar  of  the  TRIDENT  Higher  Level  Lan- 
guage (TOLL) , one  head  to  the  left  and  one  terminal  to  the  right  of  the 
current  non-terminal  is  sufficient.  The  TOLL  parser  uses  a set  of  com- 
puter-generated tables  to  determine  for  each  triple  h,  V,  T the  next 
configuration.  The  transition  information  associated  with  each  triple 
consists  of: 

(1)  NSYN  = syntactic  transition  number  (1  < NSYN  < 6) , 

(2)  NSEM  = semantic  action  number, 

(3)  NEWX  = new  head  or  new  non- terminal  produced  by  this 
transition, 

and  is  called  the  transition  vector  for  h,  V,  T.  This  is  described  in 
detail  in  [5] . 

For  our  purpose  here,  it  is  only  necessary  to  assume  that  the 
parser  runs  through  a sequence  of  configurations  described  above  and 
that  the  next  configuration  is  computed  from  the  configuration  base, 
consisting  of  the  current  non-terminal  V and  some  limited  left  and  right 
context.  Such  a parser  is  called  a context  parser  in  contrast  to  an 
LR(k)  parser.  We  call  the  sequence  of  heads  to  the  left  of  the  current 
non-terminal  the  head  stack.  It  represents  all  incomplete  rightsides 
of  productions  that  wait  for  completion  in  the  reverse  order  in  which 
they  were  created. 

To  simplify  the  discussion,  we  assume  that  the  configuration  base 
is  a triple.  However,  the  approach  is  valid  in  general. 
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4.  SYNTAX  DIRECTED  SEMANTIC  PROCESSING 


The  main  purpose  of  parsing  is  to  determine  syntactic  correctness 
of  the  input  string  and  to  guide  semantic  processing  to  effect  transla- 
tion of  the  input  to  a semantically  equivalent  output. 

To  achieve  the  syntax  directed  semantic  processing,  a semantic 
routine  is  associated  with  each  parsing  cycle.  On  the  grammar  level, 
this  means  that  a semantic  action  number  NSEM  is  associated  with  each 
head  and  with  each  complete  rightside.  The  semantic  action  SEMANTICS (NSEM) 
is  executed  before  the  corresponding  head  becomes  the  new  head  during 
the  present  reduction  cycle  or  before  the  corresponding  rightside  is 
reduced  to  the  leftside  non-terminal. 

We  define  a grammar  with  semantic  action  numbers  as  follows: 
letV-V1T1V2T2...VkTkVk+1 

be  any  production  (some  of  the  V^'s  may  be  missing)  and  let  ht,  h2,...,  hfc 
be  the  heads  corresponding  to  the  rightside,  from-  left  to  right.  Then  with 
this  production  a sequence  of  numbers  ns,  n2,...,  n^,  n is  associated  with 
the  following  significance: 

ni*  * ^ i ^ •<*  specifies  the  semantic  action  to  be  taken  when  h^  is 
the  new  head  of  a syntactic  reduction  (cases  1 and  5). 

n specifies  the  semantic  action  when  V is  the  new  non- terminal  of  a 
syntactic  reduction  (cases  2,  3,  and  4). 

It  is  well  known  that  semantic  processing  is  a major  source  of  diffi- 
culty in  error  recovery.  Not  only  is  it  necessary  after  a syntax  error 
to  search  for  a new  sound  syntactic  situation  from  which  to  continue, 
but  it  is  equally  important  to  establish  a corresponding  semantic  situ- 
ation that  makes  sense  as  a basis  for  continuing  semantic  processing. 

It  is  assumed  here  that  the  semantic  situation  is  completely  de- 
scribed by  a small  set  of  variables  called  semantic  environment.  For 
the  TRIDENT  Compiler,  this  set  of  variables  consists  of: 

RTOP  = a pointer  to  the  top  of  the  stack  of  operators, 

DTOP  - a pointer  to  the  top  of  the  stack  of  operands, 

ICFL  = a pointer  to  the  last  entry  into  the  translated  code, 

NT  - current  non- terminal  item,  and 
T - current  terminal  item. 
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The  first  three  pointers  must  be  reset  to  previous  values,  thus  effec- 
tively undoing  semantics  that  have  begun  but  could  not  be  completed 
properly.  The  other  variables  can  be  set  to  standard  values. 

Thus,  in  order  to  align  semantic  and  syntactic  processing  across 
syntactic  errors,  the  semantic  environment  is  saved  in  certain  syntactic 
situations  and  is  retrieved  when  this  syntactic  situation  becomes  a 
resume  point  after  processing  a syntax  error.  In  the  TRIDENT  Compiler, 
the  three  pointers  RTOP,  DTOP,  and  ICFL  of  the  semantic  environment  are 
saved  uniformly  whenever  a new  head  is  put  on  the  head  stack  (NSYN  = 1, 
and  NSYN  = 5).  Thus,  the  head  stack  contains  with  each  head  the  seman- 
tic environment  that  must  be  reestablished  when  the  corresponding  head 
becomes  the  top  of  the  head  stack  after  a bracketed  construct  has  effec- 
tively been  removed  and  replaced  by  a default  value  as  a result  of  a 
syntax  error. 
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5.  SYNTAX  ERRORS  IN  A CONTEXT  PARSER 


A context  parser  determines  that  the  present  configuration  is  an 
error  configuration  if  it  cannot  compute,  according  to  its  tables,  a 
successor  configuration.  An  error  configuration 

h0 H,  V,  T y 

can  occur  because 

A.  T is  an  error  item  in  the  source  text,  or 

B.  V is  an  error  item  in  the  source  text,  or 

C.  V represents  a substring  in  the  source  text  which  contains 
an  error  item,  or 

D.  H represents  a substring  in  the  source  text  which  contains 
an  error  item. 

For  cases  C and  D,  the  error  item  occurred  earlier  but  was  not  rec- 
ognized by  the  parser  at  that  time.  This  is  to  be  expected  with  a par- 
ser using  only  limited  context  to  determine  the  transition,  enough  for 
syntactically  correct  source  programs  but  possibly  insufficient  for  syn- 
tactically incorrect  ones.  However,  an  error  item  will  always  eventually 
cause  the  occurrence  of  an  ungrammatical  configuration  base. 

In  the  following  examples  corresponding  to  the  four  cases  listed 
above,  the  error  item  is  encircled  and  the  error  configuration  is  indi- 
cated: 


A. 


B. 


C. 


D. 


BEGIN  X = 

BEGIN  X = 

BEGIN  INTEGER 
BEGIN  INTEGER 


2 @ — 


= 2; 

3 END  FINIS 

X = 1 STEP  REAL  X END  FINIS 


Error  configurations: 


A.  BEGIN  var  = | , num,  IF 

B.  BEGIN  var  = , , label-id,  

C.  hn  prog-hd  SEMIC  , assign-e,  END  FINIS 

_J  I 

h0  , block-hd , END  FINIS 


D.  h0  prog-hd  SEMIC  FOR... STEP  , D-DCL,  END  FINIS 

The  fact  that  an  error  situation  may  not  be  recognized  when  the 
error  item  is  read  but  possibly  much  later  does  not  present  a problem 
in  the  TRIDENT  Compiler.  Using  triples  for  computing  transitions,  this 
case  is  rare  anyway.  In  fact,  it  may  be  advantageous  to  delay  recogni- 
tion of  an  error  until  the  next  construct  starting  with  a left  bracket 
such  as  FOR,  IF,  or  ( has  been  parsed.  Pinpointing  the  error  item  to 
the  programmer  may  not  be  quite  as  straightforward.  An  error  message 
may  for  example  say: 

SYNTAX  ERROR  AT  ITEM  PRECEDING  MOST  RECENT  IF 
EXPRESSION  ENDING  IN  LINE  . 

In  the  TRIDENT  Compiler,  this  technique  is  needed  only  for  the  case  where 
statements  and  expressions  are  found  in  the  outermost  block  where  they 
cannot  occur  legally. 
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6.  ERROR  RECOVERY  IN  CONJUNCTION  WITH  THE  CONTEXT  PARSER  OF  THE 
TRIDENT  COMPILER 


This  section  describes  in  some  detail  the  method  used  to  recover 
from  syntax  errors  in  preset  statements,  executable  statements  and  ex- 
pressions in  the  TRIDENT  Compiler  [6, 7] . Appendix  A contains  the  rel- 
evant grammar,  Appendix  B lists  terminals  together  with  their  names,  and 
Appendix  C contains  the  Error  Recovery  Table  (HTMAP)  representing  various 
classes  of  terminals,  non- terminals,  and  heads. 

When  the  parser  recognizes  a syntax  error,  the  procedure  ERRP  is 
entered.  One  of  the  following  conditions  will  exist: 

A.  The  pair  H,  T is  ungrammatical , or 

B.  The  pair  H,  T is  grammatical  but  the  triple  H,  V,  T is 

not,  or 

C.  The  triple  H,  V,  T is  grammatical  but  the  transition  to  the 
next  configuration  uses  a "marked"  head  to  form  a new  head  or  to  form  a 
new  non-terminal.  This  means  that  a substring  was  being  parsed  while  the 
parser  was  in  an  error  condition.  This  substring  may  be  represented  by 

V or  it  may  already  have  been  discarded  at  this  point,  in  which  case  V 
is  an  input  non-terminal.  Thus,  ERRP  is  capable  of  initiating  the  parsing 
of  a substring  and  to  take  corrective  action  after  that  substring  is 
parsed  completely.  The  corrective  action  consists  of  reestablishing  the 
semantic  environment  as  it  existed  when  the  current  head  was  created. 

This  semantic  environment  is  found  on  the  head  stack  together  with  the 
corresponding  head.  The  non-terminal  is  left  unchanged  if  it  is  an  in- 
put non-terminal i otherwise,  it  is  replaced  by: 


(1) 

D-DCL, 

(2) 

number 

(3) 

NIL 

depending  on  which  one  fits.  They  cure  tried  in  this  order.  If  none  fits, 
then  NIL  is  chosen  and  regular  error  processing  takes  over. 

In  general,  error  processing  is  done  for  classes  of  terminals,  of 
non-terminals,  and  of  heads.  These  classes  are  defined  according  to  the 
syntactic  characteristics  of  their  elements  as  follows. 
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6.1  CLASSES  OF  TERMINALS 


The  following  classes  TC(i),  i =>  1,...,  7,  represent  a partition  of 
all  terminals.  Terminals  in  the  same  class  are  treated  alike. 

TC (1)  = {FINIS,  END,  SEMIC,  ),  IFEND,  CASEENd} 

TC (2)  = {LOC,  BITNOT,  NOT} 

TC (3)  = {GOTO,  RETURN,  EXIT,  LOOPEXIT,  PROCEDURE,  FOR,  SWITCH,  PRESET, 
CASE,  IF) 

TC (4)  = {BEGIN} 

TC (5)  = {(} 

TC (6)  = {LB} 

TC (7 ) = {all  remaining  terminals} 

Elements  of  TC(1)  are  "right  brackets"  which  initiate  a search  on 
the  headstack  for  a corresponding  left  bracket.  When  a left  bracket  is 
found  that  matches,  then  the  complete  bracketed  construct  is  effectively 
removed  from  the  configuration.  This  means,  for  example,  that  for  the 
pair  (BEGIN  i , SEMIC)  the  last  statement  is  removed,  for  the  pair 
(BEGIN  i , END)  the  entire  block  is  removed.  If  the  pair  (head,  terminal) 
does  not  match  then,  except  when  the  head  is  a begin  head,  this  is  in- 
terpreted as  a complete  construct  with  its  right  bracket  missing.  The 
terminal  in  TC(1)  that  caused  the  backup  is  not  removed  in  this  case. 
However,  if  the  head  is  a begin  head,  then  the  incomplete  statement  in- 
cluding the  current  terminal  is  removed. 

The  set  TC(1)  is  partitioned  again,  each  element  being  in  its  own 
class  except  END  and  SEMIC  beinq  in  one  class  together. 

TBR(O)  = {FINIS} 

TBR(l)  = {END,  SEMIC} 

TBR(2)  = {)} 

TBR(3)  = {IFEND} 

TBR(4)  = {CASEEND} 

Each  class  TBR(i)  of  terminal  brackets  corresponds  to  a class  HDBR(i) 
of  matching  head  brackets  as  defined  in  Section  6.2. 

I 
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Elements  of  TC(2)  start  a new  expression  that  can  occur  in  runtime 
or  compile  time  (PRESET)  statements.  Elements  of  TC(3)  start  a new  state- 
ment, or  possibly  am  expression,  that  is  not  allowed  in  PRESETS.  For 
elements  of  TC(3) , the  insertion  of  a semicolon  is  attempted  first.  If 
that  does  not  produce  a legal  configuration  then  the  current  terminal 
will  be  given  a default  head  that  allows  parsing  to  continue,  and  the 
current  head  on  top  of  the  head  stack  is  marked.  The  technique  of  using 
a default  head  that  fits  to  the  current  terminal  is  used  for  terminals 
in  most  classes  after  certain  tests  have  been  made.  The  default  head 
is  for  terminals  in  TC(2),  WHILE  t for  LB,  and  BEGIN  i otherwise. 

BEGIN  cannot  be  put  into  the  class  TCP)  since  it  may  open  a block 
or  continue  a preset  statement  already  started  by  the  terminal  PRESET. 
Thus,  BEGIN  does  not  always  start  a new  construct. 

For  the  left  parenthesis  two  cases  must  be  distinguished.  First, 
it  may  start  a new  construct  in  which  case  the  non-terminal  preceding 
it  in  the  current  configuration  must  be  Nib.  Second,  it  may  start,  to- 
gether with  and  depending  on  the  non-terminal  preceding  it,  a subscripted 
variable,  component  variable,  switch  expression,  or  procedure  call.  In 
both  cases  the  current  head  on  top  of  the  head  stack  is  marked  and  parsing 
will  continue  with  BEGIN  i as  the  default  head  for  the  first  case  and  with 
( | for  the  second  case. 

Elements  of  TC(7)  are  discarded;  that  is,  effectively  removed  from 
the  configuration  and  the  next  terminal  in  the  remainder  string  is  made 
the  current  one.  If  this  new  terminal  was  not  preceded  by  a non- ter- 
minal, other  them  NIL,  then  the  old  non- terminal  is  kept;  otherwise,  it 
is  replaced  by  the  new  one.  This  new  configuration  is  checked  for  legal- 
ity, or  if  it  cem  be  made  legal  by  replacing  the  current  non-terminal  by 
D-DCL,  or  NUMBER,  or  NIL.  If  not,  then  it  is  treated  according  to  the 
described  classification  of  the  current  terminal. 


6.2  CLASSES  OF  HEADS 

Each  of  the  classes  defined  below  represents  the  collection  of  those 
heads  that  have  absorbed  a specific  left  bracket  as  indicated  in  the 
following  table. 


Head  Class 

Left  Bracket  Absorbed 

Corresponding  Right  Bracket 

HDBR(O) 

- 

FINIS 

HDBR(l) 

BEGIN 

END  ; 

HDBR ( 2 ) 

( 

) 
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Head  Class 


Left  Bracket  Absorbed  Corresponding  Right  Bracket 


HDBR  ( 3 ) 

IF 

IFEND 

HDBR(4) 

CASE 

CASEEND 

HDBR (5) 

LB 

RB  same  as  ) 

HDC(O)  =>  {h0} 

HDC(l)  = {PROG-HD 

SEMIC  , , 

BLOCK-HD  SEMIC  , , BEGIN  , , 

PRESET-HD  SEMIC  , , PRESET  BEGIN  . } 


HOC (2)  = {LP  , , ARRAY- ID  LP  , , STACK- ID  LP  , , P ROC-ID  LP  , , 

COMP- ID  LP  , , GOTO  SWITCH- ID  LP  , , VAR-HD  COMMA  . . 

COMP-HD  COMMA  , , FUN-EXP-HD  COMMA  , 1 

HOC (3)  = {IF  , , IF  E THEN  , , IF- HD  COMMA  , , IF-HD  COMMA  E THEN  , . 

IF- HD  ELSE  , } 

HDC(4)  = {CASE  , , CASE  E DO  , , CASE-HD  COMMA  , , } 

HOC  (5)  = {WHILE  E LB  , , FOR  VAR- ID  = E STEP  E UNTIL  E LB  , , 

FOR  VAR-ID  = E STEP  E WHILE  E LB  , . ' 

FOR  VAR- ID  = E REPEAT  E WHILE  E LB  , , 

LOOP- HD 1 COMMA  , , LOOP- HD 2 COMMA  , , 

LOOP- HD 3 COMMA  ■ , LOOP-HD4  COMMA  . } 

All  elements  of  one  head  class  will  match  one  specific  right  bracket, 
in  one  case  two  right  brackets  (END  ; ) . Conversely,  to  each  right  bracket 
corresponds  one  unique  head  class  with  the  exception  of  the  right  paren- 
thesis. The  terminals  ) and  } have  not  been  distinguished  syntactically; 
hence,  the  conflict  concerning  their  head  classes  must  be  resolved  by 
a special  test.  A situation  like  this  should  be  avoided  in  the  design 
of  a grammar. 


6.3  CLASSES  OF  NON-TERMINALS 

For  language  constructs  that  have  a repetitive  substructure  such  as 
IF  expressions,  CASE  expressions,  or  blocks,  the  grammar  is  designed  such 
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that  a non-terminal  represents  an  initial  string  to  which,  recursively, 
a new  substructure  can  be  added  on,  or  that  can  be  closed  by  a "right 
bracket"  terminal.  For  example: 

CASE-HD  ■+  CASE  E DO  E-ST 

CASE-HD  -*■  CASE-HD  COMMA  E-ST 

CASE-E  -*  CASE-HD  CASEEND 

Therefore,  non-terminals  of  this  kind  must  not  be  discarded  without 
having  been  closed  by  a right  bracket.  The  classes  NTC(l)  and  NTC(2) 
are  of  this  type.  A closer  look  at  the  grammar  shows  that  such  a non- 
terminal U is  always  formed  by  a transition  using  U •+■  H V from  a 
configuration 

— h'H,  V,  T y 

where  H,  V,  T was  a grammatical  triple.  Otherwise,  this  transition  could 
not  have  been  made.  Therefore,  the  resulting  configuration 

h\  U,  T y 

is  in  error  only  because  of  h' . The  proper  treatment,  then,  is  to  supply 
a default  head  for  U,  T and  to  complete  the  current  construct  whose 
initial  segment  is  represented  by  U.  After  that,  the  error  in  h'  will 
come  up  again.  The  default  head  for  elements  of  NTC(l)  is  BEGIN  , and 
for  elements  of  NTC(2)  is  FUN-EXP-HD  COMMA  . 

NTC(l)  * {PROG-HD,  BLOCK-HD,  PRESET-HD,  IF-HD,  CASE-HD,  VAR-HD, 
COMP-HD,  FUN-EXP-HD } 

NT C(2)  - { LOOP-HD1 , L00P-HD2,  L00P-HD3,  L00P-HD4 } 


Finally,  the  following  class 

NTLP(l)  - lARRAY-ID,  STACK-ID,  PROC-ID,  SWITCH-ID,  COMP-ID) 

contains  those  non-terminals  that  can  precede  a left  parenthesis  and  must 
be  associated  with  this  left  parenthesis  to  start  a new  construct.  No 
other  non-terminal  apart  from  the  filler  NIL  can  precede  a left 
parenthesis. 
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6.4 


SINGULAR  CASE 


All  non-terminals  in  NTC(l)  and  NTC(2)  have  absorbed  a left  bracket. 
The  case  can  occur  that  a non- terminal  absorbs  a right  bracket.  This 
happens  in  the  THLL  grammar  only  once : 

LABEL-END  -*■  LABEL-ID  : END 

When  an  error  occurs  with  LABEL-END  in  the  non-terminal  position,  then  the 
non-terminal  is  replaced  by  NIL  and  an  END  is  inserted.  This  corresponds 
to  replacing  a labeled  END  by  an  unlabeled  END. 


6.5  IMPLEMENTATION  CONSIDERATIONS 


The  classes  of  terminals,  non-terminals,  and  heads  defined  above  can 
be  implemented  economically  and  accessed  very  efficiently  using  a single 
one -dimensional  array  HTMAP  containing  N words  where  N = max  (number  of 
heads,  number  of  non-terminals,  number  of  terminals).  Each  class  name 
C corresponds  to  a field  in  a word.  If  I is  the  numeric  code  for  a ter- 
minal, or  a head,  then 


C [LOC  HTMAP  (I) ] 


J if  I belongs  to  C(J) 
0 otherwise. 


In  the  same  manner,  default  heads  associated  with  terminals  can  be 
encoded  in  this  array.  Alternatively,  default  heads  associated  with  a 
class  can  be  encoded  directly  in  the  field  for  that  class.  This  is  done 
in  the  TRIDENT  Compiler  for  NTC.  The  field  for  NTC  contains  the  encoding 
of  BEGIN  i for  all  elements  of  NTC(l);  it  contains  the  encoding  of 
FUN-EXP-HD  COMMA  , for  all  elements  of  NTC(2).  In  the  TRIDENT  Compiler, 
HTMAP  is  an  array  of  74  32-bit  words.  This  array  also  contains  other 
information  indicated  by  SCHEMA.  It  is  used  for  generating  a file  that 
is  needed  by  a Symbolic  Debug  System  TO ADC  [8] . 

Besides  such  a classification  device  for  terminals,  non-terminals, 
and  heads,  a set  of  basic  utility  procedures  is  needed  for  at  least  the 
following  purposes. 

A.  Checking  if  a given  pair  (head,  terminal)  or  a given  triple 
(head,  non-terminal,  terminal)  is  legal.  Computing  the  transition  vector 
for  a given  triple. 

B.  To  allow  the  replacement  of  the  current  non-terminal  by  a 
filling  non-terminal  that  makes  the  resulting  triple  legal. 

C.  To  provide  a general  capability  of  inserting  items  into  the 
sequence  of  items  as  seen  by  the  parser. 
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D.  Debugging  aids  to  allow,  depending  on  options,  the  printing 
of  parsing  cycles  (configuration  base  and  transition  vector)  of  the  en- 
tire head  stack,  the  entire  D-stack,  the  entire  R-stack,  and  selected 
portions  of  the  translated  code. 


All  these  routines  and  same  others,  more  special  ones,  are  employed 
in  the  syntactic  error  recovery  procedure  of  the  TRIDENT  Compiler. 


7 . CONCLUSION 


This  paper  describes  some  principles  on  which  the  recovery  from 
syntactic  errors  encountered  in  a context  parser  are  based.  Both  parsing 
and  error  recovery  are  table  driven.  This  provides  a unified  and  reli- 
able approach  for  handling  a wide  variety  of  problems  that  can  be  for- 
mulated as  translation  of  a language  described  by  an  operator  grammar. 

In  the  TRIDENT  compiler,  the  table  HTMAP  used  in  error  recovery 
was  constructed  by  hand,  while  the  parsing  tables  were  generated  by  a 
general  program  from  a grammar.  From  the  definition  of  the  various 
classes  defined  in  the  previous  sections,  it  can  be  seen  that  these 
classes  can  be  formed  and  encoded  into  an  error  recovery  table  automat- 
ically from  the  grammar  and  a high  level  description  of  some  character- 
istics concerning  the  meaning  of  grammar  symbols.  It  is  planned  to 
incorporate  the  construction  of  an  error  recovery  table  into  the  LISP 
program  that  currently  generates  the  parsing  tables. 
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TRIDENT  STATEMENT  AND  EXPRESSION  GRAMMAR 


The  grammar  as  Hated  in  thia  appendix  represents  the  input  data 
for  a LISP  program  to  produce  parsing  tables.  It  consists  of  the 
following  six  groups  of  datai 

(1)  Liat  of  options 

(2)  List  of  terminals;  they  will  be  assigned  a numerical 
code,  in  the  order  in  which  they  appear 

(3)  Goal,  highest  syntactic  unit 

(4)  List  of  productions  together  with  semantic  action  numbers 

(5)  List  of  defined  terminals 

(6)  List  of  input  non-terminals 

As  usual  in  LISP,  a list  of  items  is  written  as 
(XI  X2  X3 . . . ) 

where  each  item  X^  can  be  a symbol  or  a number  or  a list.  A production 
Y -*•  XI  X2 . . . Xn 
is  represented  as 

(Y  XI  X2...Xn) 

Each  production  is  followed  by  the  list  of  the  associated  semantic  action 
numbers.  The  action  number  10  specifies  a no  action. 
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